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Support vector machine (SVM) is a known method for supervised learning in 
sentiment analysis and there are many studies about the use of SVM in 
classifying the sentiments in lecturer evaluation. SVM has_ various 
parameters that can be tuned and kernels that can be chosen to improve the 
classifier accuracy. However, not all options have been explored. Therefore, 
in this study we compared the four SVM kernels: radial, linear, polynomial, 
and sigmoid, to discover how each kernel influences the accuracy of the 
classifier. To make a proper assessment, we used our labeled dataset of 
students’ evaluations toward the lecturer. The dataset was split, one for 
training the classifier, and another one for testing the model. As an addition, 
we also used several different ratios of the training:testing dataset. The split 
ratios are 0.5 to 0.95, with the increment factor of 0.05. The dataset was split 
randomly, hence the splitting-training-testing processes were repeated 1,000 
times for each kernel and splitting ratio. Therefore, at the end of the 
experiment, we got 40,000 accuracy data. Later, we applied statistical 
methods to see whether the differences are significant. Based on the 
statistical test, we found that in this particular case, the linear kernel 
significantly has higher accuracy compared to the other kernels. However, 
there is a tradeoff, where the results are getting more varied with a higher 
proportion of data used for training. 
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1. INTRODUCTION 


Since introduced in 1995 [1], support vector machine (SVM) has become a popular supervised 
method in machine learning (ML). The rise of ML as a potential tool for the future [2], leads to the emerging 
use of SVM. For example, SVM regression was used in telecommunication to estimate channel in multiple 
input multiple output (MIMO) orthogonal frequency division multiplexing (OFDM) by using the 
interpolation mechanism [3]. The study in [4] employed SVM, to extract buildings object from very high- 
resolution satellite images of Tetuan, Morocco, by using spatial and spectral radius with 83.76% accuracy. 
Other utilization of SVM in image processing could also be found in [5], where the multi-class SVM is used 
to classify three types of rice grain with 92.22% accuracy. SVM was used in [6], to forecast wind speed, 
based on the direction, the former data (historical), pressure, moisture, and heat, although it is bested by the 
other methods in the same study. Recently, SVM was also considered in a study of flood risk prediction [7]. 
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In spite of its accuracy issue in a some scopes of applications, as found in [6], SVM is still a 
preferable method in sentiment and/or emotion analysis [8]. To improve its accuracy, there are studies about 
the kernel tricks. For instance, Amari and Wu [9], found that modifying the gaussian radial basis function 
(RBF) kernel brought remarkable improvement. In another study, the kernels are compared to find the best 
one for emotion recognition based on the processing of physiological signals and it is found that the linear 
kernel gave the highest accuracy rate [10]. On the other hand, the work of Asraf and Shah Rizam [11], 
compared the SVM kernels to classify nutrient disease in oil palm leaves showed that the polynomial kernel 
performed best compared with the other kernels. Similarly, the kernel comparison for termite detection also 
found the polynomial kernel as the best one [12]. Shantini et a/ in [13], compared SVM kernels for fault 
classification in the analog filter circuit. Therefore, based on the former studies [9-13], each kernel may bring 
different impacts on the accuracy of the classifier. 

Educational sentiment analysis is a field of SVM application. This field commonly addresses the 
students’ evaluations of the lecturer [14-19]. In one of our former work [14], we have applied support vector 
machine (SVM) to classify the sentiments in lecturer evaluation by students. In [15], Altrabsheh et al 
proposed the sentiment analysis for education (SA-E), which is a system that will collect students’ feedback 
by using the social network and then processed them by using SVM and/or naive bayes. The results in [16], 
showed that when used to classify sentiments within this particular field, SVM came as the best method. 
Similarly, the study in [17] also proposed the use of Twitter for their data collection and in sentiment 
analysis, with SVM to classify the feedbacks, for the improvement of the teaching-learning process. Later, 
this study is extended in [18]. In the Indonesia context, another study on student feedbacks toward the 
lecturer was reported in [19]. 

There are pieces of evidence which found that different kernel may bring different effects on the 
accuracy of the classifier, yet the study on SVM kernels for sentiment analysis in education is rare. 
Therefore, in this study we compare the accuracy rates yielded by various SVM kernels. In the comparison, 
we also include several different ratios of the training:testing data so we may identify the impact of the 
kernels in different sizes of training data. The rest of this paper is organized as follows: In Section 2 we 
present the methodologies used in this study, such as the data source, and the steps/stages in the research. 
Then we present the results in Section 3, and finally in Section 4 we conclude this paper and present some 
plausible extensions to this study. 


2. RESEARCH METHOD 

The course of this study is presented in Figure 1. There are 4 major phases, which are data 
collection, pre-processing, processing, and post-processing. Generally, it can be seen as a mixture of the steps 
used in [18, 20-21] with the exceptions on the data collection and post-processing phases, where the works in 
[18] and [20] used crawled data from Twitter, while the work in [21] used crawled data from news portals. 
The final phase, post-processing, refers to a set of statistical tests where the accuracy data are compared. 


| Data Collection | | Pre-processing | Processing | | Post-processing 
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Figure 1. Course of research 
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2.1. Data collection 

The data for this study is acquired from the lecturer evaluation/student feedbacks of the spring 
semester 2019. The same data was used in [14]. In May to mid-June, students were asked to fill an 
anonymous online evaluation form for each lecturer they had during the semester that was about to end. It 
means students might have made several submissions. As presented in Figure 1, these feedbacks are stored in 
a database. Collected data consists of 636 feedbacks with 430, 109, and 97 of positive, negative, and neutral 
sentiments, respectively. It is found that the students used several languages when giving their feedback. 
Mostly use standard Bahasa Indonesia, some used English, and some used Bahasa Indonesia mixed with 
local diction. 


2.2. Pre-processing 

The pre-processing phase, along with the processing phase discussed in the next subsection are the 
common phases in natural language processing (NLP). In this study, the pre-processing phase was done in 
several steps: First, each feedback was manually labeled. In this step, the collected feedbacks, or usually 
referred to as documents in the field of sentiment analysis, were manually analyzed and labeled according to 
its sentiment. From the 636 collected documents, there are 430, 109, and 97 of positive, negative, and neutral 
sentiments, respectively. The percentages of the sentiments are shown in Figure 2. 


97 (15.25%) 
Sentiments 
109 (17.14%) negative 
neutral 
430 (67.61%) nsaiie 


0/100 


Figure 2. Sentiments composition in the dataset 


After all, feedbacks labeled, the letter case in all documents is folded to small caps. Later, each 
document was tokenized, which in this case each token is a word. The non-standard words (NSW) handling 
is a step that closely related to the carelessly abbreviated and/or mistyped words, which is found as a 
characteristic of people in Indonesia [22]. In this step, a customized dictionary containing pairs of non- 
standard: standard word is used. When the NSWs were handled, then the stopwords were removed. As 
described earlier in Subsection 2.1, since the students used various languages, then several stopword 
dictionaries were used in this step: 

1. The English stopwords from the Tidytext packet for R [23]; 
2. The Bahasa Indonesia stopwords dictionary, as available on-line in [24-25]; and 
3. We built our own list of local stopword dictionary, based on the words used by the students. 

After the documents cleared from stopwords, all numbers and punctuation are removed. The final 
step in this phase is the Feature selection, which in this study it is based on the word frequency. All words 
that appeared less than 10 times were deemed as insignificant and so removed. 

Although stemming is commonly used in the pre-processing phase, in this study this step was 
omitted, as was done in [14]. The reason behind this is that stemming has been identified to bring no 
significant impact to the accuracy when applied to documents with Bahasa Indonesia [26]. 

The steps within this phase brought a great impact on the data. First, as shown in Figure 3(a), the 
stopwords were frequently used. The rank of the most frequent words then drastically changed after the 
nonstandard word handling and stopword removal, as shown in Figure 3(b). This change also reflected in the 
number of words, as shown in Figure 4. Prior to the stopword removal, there were 8,446 and after that, only 
1,049 words are left. It means, there were 7,397 stopword and non-standard words in the collected data that 
had been removed, or approximately 87.58%. 
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Figure 3. The comparison of the 30 most frequent words after tokenization/before NSW handling, and after 
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Figure 4. Number of words before and after stopword removal 


2.3. Processing 

As the objective of this study is to make comparisons between the SVM kernels and also to compare 
the effect of the training:testing dataset ratio, then a major part of this phase is designed to be done several 
times, to achieve a large number of samples for each parameter tested so that the law of large number could 
be satisfied [27]. Therefore, for each pair of the kernel and training:testing ratio, the following processes 
were repeated for 1000 times: 
1. Setting up the randomization seed with the iterator, so for all i-th repetition in all pairs of kernel 

training:testing ratio, the training datasets were always identical, and so with the testing dataset. 
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2. Split the dataset for training and testing, with the percentage of training dataset started from 50% toward 
95%, uniformly increased by 5%, or in other words 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 
parts from the collected data. These data were randomly split by using the createDataPartition() 
function, available in the caret package for GNU R [28]. Later, each dataset was transformed into a 
document term matrix (DTM) by using the tm package for GNU R [29]. 

3. The training dataset then used for training the SVM classifier by using the e1071 package for R [30]. 
This step produced an SVM classifier (model), based on the particular training dataset. 

4. To test the accuracy of the model, it was used with the testing dataset. The model classified each 
document in the testing dataset into one of the sentiments. After that, the output of the model were 
compared with the real sentiment label of the documents. This process then yielded the accuracy of the 
model in percent. 

After all repetitions for each kernel training:testing ratio finished, the accuracy data are collected in 

a dataset. As there were 4 kernels, 10 training:testing ratios and 1,000 repetitions, then this dataset contains 

40,000 rows of data. 


2.4. Post-processing 

The term post-processing was chosen to represent the analysis phase done after the SVM 
processing. Mainly, this phase covers the statistical procedures used to make the comparison between tested 
parameters, which are kernels and ratio between training and testing datasets. The main step within this phase 
is the Variance Analysis step by using the multivariate test. In this step, the data were compared to find any 
significant variance due to the effect of the parameters. However, to decide which multivariate method to be 
used, the distribution normality must be identified first. If the variance analysis indicates any significant 
variance, then this continued to the post-hoc test. The results within this phase are comprehensively discussed 
in Section 3. For all statistical tests, a = 0.05 is adopted. The GNU R statistical software [31] was used to 
conduct these tests. 


3. RESULTS AND DISCUSSION 

To provide a general overview, we plotted the accuracy of each experiment as a scatter plot, 
grouped by the proportion (ratio) of the training data, as shown in Figure 5. For each kernel type, there is a 
line that connects the mean of the accuracy of each particular training ratio. It is clear that in every ratio of 
the training data to the testing data, the linear kernel has higher average accuracy. However, there are several 
points of the linear kernel accuracy that located too far, although this is a common phenomenon in all kernel 
types when the ratio of training data increased. 


Kernel Type 
—= radial 


— linear 


Accuracy 


— polynomial 


—— sigmoid 


0.5 0.6 0.7 0.8 09 
Training Ratio 


Figure 5. The accuracy of each experiment. Each line represents and connects the mean of the accuracy, for 
each ratio of the training data 


The boxplots in Figure 6 confirm that the linear kernel tends to yield higher accuracy, judged by the 
vertical positions of the box in Figure 6(a). When the boxplots of the accuracy grouped by the kernel, as 
shown in Figure 6(b), the indication of the higher variation due to the higher ratio of the training data can be 
visually inspected clearer. Although the linear kernel tends to have higher variation, yet the shape of the box 
and the location of the medians are commonly balanced, in which the median line located around the center, 
meanwhile for the other kernel, the median sometimes located near one of the quartiles (i.e polynomial kernel 
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at training ratio 0.8 and 0.95, while at training ratio 0.85 the median is closer to the first quartile; radial kernel 
at training ration 0.6, 0.8, 0.85 and 0.9). 
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Figure 6. Boxplot of the accuracy from each experiment. 


Figure 7 gives a visual comparison of the standard deviation, where Figure 7(a) and Figure 7(b) 
show grouped comparisons by the ratio of training data and by kernel, respectively. It is very clear that the 
accuracy of the linear kernel always deviates higher than the other kernels, followed by the radial kernel. The 
standard deviation of the polynomial and the sigmoid kernels tend to coincide in lower training ratio. 
However, with the higher proportion of the data used for training, the polynomial kernels deviate lower than 
the sigmoid kernel, as shown in Figure 7(b). 
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Figure 7. Boxplot of the accuracy from each experiment 
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3.1. Distribution normality 

The distribution normality test was done to each kernel training:testing ratio by using the Shapiro- 
Wilk method [32]. The results, in p-value, are presented in Table 1. As described in Subsection 2.3., the 
threshold for the p-value is 0.05, hence the Ho in most cases was rejected. As not all data normally 
distributed, then the subsequent tests must use the non-parametric methods. 


Table 1. Results from the Shapiro-Wilk test 


P-value for each ratio of the training data 


Romch inne 0.5 0.55 0.6 0.65 0.7 
radial 38.01x102  1.092x10  7944x10% 1497x110 1533x105 
linear 1685x103 6.788x 103  45.06x 103 3.802x103 2.429x 103 
polynomial 326.1x 1027 5.852x10% 1.753x 102! 5837x108 ~— 30.10 x 10-5 
sigmoid 452.4x10%  17.29x102! —128.4x 1078 ~—542.5x 10° 20.28 x 10° 

0.75 0.8 0.85 0.9 0.95 
radial 1183x10"  4771x10"  1405x10" 7598x102 1205x108 
linear 5604x106  451.0x10%  158.0x10%  6.702x10® 27.98 x 10" 
polynomial 843.2x 1075 157.0x 1012 79.26x 1012 = 589.4x 105 ~—-2.831 x 10"8 
sigmoid 13.29x 109 43.61x 109 —-323.7x 102 6.470 x 102, 5.778 x 10°18 


3.2. Multivariate analysis 

To gain knowledge regarding the effects of the kernel and the effects of training data ratio, each 
scenario was tested separately. First, we compared the ratios of training data in the same kernel; and second, 
we compared the kernels in each ratio of the training data. As the data are not distributed normally, then we 
applied the two-tail Kruskal-Wallis test in this step [33]. 


3.2.1. The ratio of the training data 

Based on the p-values in Table 2, the results are significant since all values are less than 0.05. The p- 
value for the polynomial kernel is not actually equal to 0. However, since the value is too small, it is way 
beyond the limits of the computer to display. Therefore, for this step, it can be said that in every kernel tested, 
at least one training ratio gave significantly different accuracy. 


Table 2. Results of the Kruskal-Wallis test for the ratio of training data comparison in each kernel 


Kernel Type p-value 
radial 585.4x10°797 
linear 567.5x1071% 

polynomial < 0.0001 

sigmoid 8.806x107135 


3.2.2. Kernels 

Similar to the comparison of the ratios of training data in the same kernel, when the kernels in each 
ratio put together to be compared, the differences are also significant, as shown in Table 3. Hence, for each 
proportion of the dataset used for training, at least one kernel had significantly different accuracy. 


Table 3. Results of the Kruskal-Wallis test for the kernel comparison of each ratio of training data 


Ratio of Training Data p-value 
0.5 415.0x101 
0.5 8.39x 10-198 
0.6 1.22x 10-204 
0.65 91.80x101% 
0.7 648.0x10°278 
0.75 1.32x 10-240 
0.8 1.20x 10-174 
0.85 916.0x10'4 
0.9 17.30x10-%3 
0.95 143.0x10-*° 
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3.3. Post-hoc analysis 

The post-hoc analysis was done to compare pairs of experiments in the same group. Based on the 
pairwise tests, it possible to find the pairs that significantly different. The method used in this step was a two- 
tail Dunn test with the Benjamini-Hochberg adjustment. Since there are too many combinations tested, then 
to save space, then in only the results with insignificant differences (p-value > 0.05) are shown in Table 4. 


Table 4. Selected results from the Dunn test for the impact of the ratio of the training dataset, on the accuracy 
for every kernel type 


Training Ratio Comparison Z P.unadj __P.adj 
0.80 polynomial - radial -1.89 0.06 0.06 
0.85 polynomial - radial -0.75 0.45 0.45 
0.90 polynomial - radial 1.61 0.11 0.11 


3.3.1. Kernel 

The results of the Dunn test for the comparison of the kernels are shown in Table 5. Based on the 
adjusted p-values, only the polynomial and radial kernels that have several insignificant differences. 
Therefore, it can be said that most kernels give significantly different accuracy. Moreover, the cases where 
the differences are not significant, happened when the ratios are in very close range. 


Table 5. Selected results from the Dunn test for comparison in each kernel the kernel impact on the accuracy, 
for every ratio of the training dataset 
Kernel Type | Comparison Z  P.unadj _P.adj 


radial 0.5-0.55 -1.24 0.22 0.23 
radial 0.65 -0.75  -1.57 0.12 0.13 
radial 0.7 - 0.75 0.97 0.33 0.34 
radial 0.8 - 0.85 0.53 0.59 0.59 
linear 0.6-0.65 -1.51 0.13 0.14 
linear 0.75 -0.8 — -1.00 0.32 0.33 
linear 0.85-0.9  -0.49 0.62 0.62 
linear 0.75 -0.95  -1.58 0.11 0.13 
linear 0.8-0.95  -0.58 0.56 0.57 
linear 0.85-0.95 1.66 0.10 O11 


polynomial 0.5-0.55 -1.23 0.22 0.23 
polynomial 0.6-0.65  -1.98 0.05 0.05 
polynomial 0.7-0.75 — -1.04 0.30 = =0.31 
polynomial 0.8-0.85  -0.27 0.79 = 0.79 
polynomial 0.9-0.95  -1.64 0.10 O.11 


sigmoid 0.5-0.55  -0.06 0.95 0.97 
sigmoid 0.6-0.65  -0.86 0.39 = 0.44 
sigmoid 0.6-0.7 -0.85 0.40 0.44 
sigmoid 0.65-0.7 0.02 0.99 0.99 
sigmoid 0.6-0.75 0.35 0.72. 0.78 
sigmoid 0.65 -0.75 = 1.22 0.22. 0.27 
sigmoid 0.7 - 0.75 1.20 0.23. 0.27 
sigmoid 0.8-0.85  -0.25 0.80 0.84 
sigmoid 0.9-0.95 — -1.73 0.08 0.10 


3.3.2. The ratio of the training data 

In total, for each kernel type, there are 45 comparisons of the training ratio which when 
accumulated, the result of the Dunn test consists of 186 rows. Therefore, only the comparisons with 
insignificant differences are shown in Table 3. The table confirms that when the ratio too close together, such 
as 0.5 and 0.55, then the results will not be significantly different. The sigmoid and then the linear kernel are 
the 2 least affected by the proportion of the data used for training. Contrarily, the radial kernel, in this case, is 
the most affected by the ratio, since only 4 of 45 experiments that are not significantly different. 


4. CONCLUSION AND FUTURE WORK 

The sentiment analysis with the SVM classifier has been studied as a solution for the tiresome works 
of manual sentiment classification. The student evaluation toward the lecturer is a field where it has been 
applied. SVM has several parameters that can be tuned to increase its accuracy, such as using different 
kernels. Yet, in sentiment analysis of the students’ evaluation, this option is rarely explored. In this study, we 


Effects of kernels and the proportion of training data on the accuracy of... (Daniel Febrian Sengkey) 


742 im) ISSN: 2252-8938 


have specifically evaluated the use of various kernels and the proportions of data used for training to the 
accuracy of the classification. Similar to the findings of the previous studies about SVM kernels, this study 
also found that the utilization of different kernels affects the accuracy of the SVM classifier, which in this 
case specifically trained for the sentiment analysis of the lecturer feedbacks from students. Only the 
polynomial and radial kernels that at a certain proportion of the dataset used for training ratio have no 
significantly different impact the accuracy. On the other hand, we also studied the impact of the dataset 
proportion used for SVM training. Based on the boxplot, a higher proportion of data used for training gives 
higher variation to the accuracy of the model. It means the model would be unreliable when the proportion of 
the training dataset increased. However, in the post-hoc test, the differences are not always significant. The 
sigmoid and then the linear are the 2 kernels with the less significant variance between close-range training 
ratio. This study already evaluated the use of kernels and the proportion of datasets used for training in SVM- 
based sentiment analysis of the students’ feedbacks toward the lecturer. Future work must address the issue 
of the better kernel, as well as the right proportion of the training dataset, that should be used in this 
particular case. 
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